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D Abstract 

Scientific studies often require the precise calculation of derivatives. In many cases an analytical calculation is not feasible 
and one resorts to evaluating derivatives numerically. These are error-prone, especially for higher-order derivatives. 
' A technique based on algorithmic differentiation is presented which allows for a precise calculation of higher-order 
,_i ^derivatives. The method can be widely applied even for the case of only numerically solvable, implicit dependencies 
which totally hamper a semi-analytical calculation of the derivatives. As a demonstration the method is applied to a 
quantum field theoretical physical model. The results are compared with standard numerical derivative methods. 
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^ 1. Physical motivation 
00 

In many scientific studies the knowledge of derivatives 
of a given quantity is of particular importance. For ex- 
ample in theoretical physics, especially in thermodynam- 
ics, many quantities of interest require the calculation of 
^-H derivatives of an underlying thermodynamic potential with 
0^ respect to some external parameters such as temperature, 
'volume, or chemical potentials. In many cases the ther- 
^ modynamic potentials can only be evaluated numerically 
and one is forced to employ numerical differentiation tech- 
niques which are error-prone as any numerical methods. 
d Furthermore, the thermodynamic potential has to be eval- 
uated at the physical point defined by minimizing the 
thermodynamic potential with respect to some conden- 
sates yielding the equations of motion (EoM). Generally, 
these equations can be solved only numerically and thus 
introduce additional implicit dependencies which makes 
the derivative calculations even more complicated. 

Even in cases where the thermodynamic potential and 
the implicit dependencies on the external parameters are 
known analytically, the evaluation of higher-order deriva- 
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tives becomes very complex and tedious and in the end 
impedes their explicit calculation. 

In this work we present a novel numerical technique, 
based on algorithmic differentiation (AD) to evaluate deriva- 
tives of arbitrary order of a given quantity at machine 
precision. Compared to other differentiation techniques 
such as the standard divided differentiation (DD) method 
or symbolic differentiation, the AD produces truncation- 
error-free derivatives of a function which is coded in a com- 
puter program. Additionally, AD is fast and reduces the 
work required for analytical calculations and coding, espe- 
cially for higher-order derivatives. Furthermore, the AD 
technique is applicable even if the implicit dependencies 
on the external parameters are known only numerically. 
In Ref. [ij a comprehensive introduction to AD can be 
found. First remarks about the computation of deriva- 
tive of implicitly defined functions were already contained 
in 0. However, a detailed description and analysis is not 
available yet. Additional information about tools and lit- 
erature on AD are available on the web-page of the AD- 
community 

This work is organized in the following way: For illus- 
trations we will introduce an effective model, the so-called 
linear sigma model with quark degrees of freedom in Sec. [21 
This model is widely used for the description of the low- 
energy sector of strongly interacting matter. As a starting 
point the basic thermodynamic grand potential and the 
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EoM of this model are calculated in a simple mean-field 
approximation in order to elucidate the technical problems 
common in such types of calculations. Before we demon- 
strate the power of the AD method by calculating certain 
Taylor expansion coefficients up to very high orders for the 
first time in Sec. [6l the AD method itself and some math- 
ematical details are introduced in Sec. [3l Details for the 
calculation of higher-order derivatives of implicit functions 
are given in the following Sec. 21 In Sec. [5] the results of 
the AD method are confronted with the ones of the stan- 
dard divided differences (DD) method in order to estimate 
the truncation and round off errors. Finally, we end with 
a summary and conclusion in Sec. [71 

2. A model example 

In order to illustrate the key points of the AD method 
we employ a quantum field theoretical model This 
model can be used to investigate the phase structure of 
strongly interacting matter described by the underlying 
theory of Quantum Chromo dynamics (QCD). Details con- 
cerning this effective linear sigma model (LcrM) in the 
QCD context can be found in reviews, see e.g. [1, 0]. 

The quantity of interest for the exploration of the phase 
structure is the grand potential of the LcrM. This ther- 
modynamic potential depends on the temperature T and 
quark chemical potential /j, because the particle number 
can also vary. It is calculated in mean-field approximation 
whose derivation for three quark flavors is shown explicitly 
in 0]. For the LcrM the total grand potential consists 
of two contributions 



(1) 



where the first part, U, stands for the purely mesonic po- 
tential contribution and is a function of two condensates, 
erg and as- The second part, ^Iqq, is the quark contribution 
and depends on the two condensates as well as on the ex- 
ternal parameters temperature T and, for simplicity, only 
one quark chemical potential /i. Since the quark contri- 
bution arises from a momentum-loop integration over the 
quark fields, it is given by an integral which cannot be 
evaluated in closed form analytically. Readers who are 
unfamiliar with the physical details, may simply regard 
Eq. II]) as an only numerically known function and con- 
tinue with the reading above Eq. ([5]), which introduces 
an implicit dependency on the parameters T and whose 
treatment with the AD technique is the major focus of this 
work. 

Explicitly, in mean-field approximation the quark con- 
tribution reads 



nqq=6T J2 j^W~^nAT.P)) 

+ Hl~nqj{T,y))} , (2) 



f—u,d,s ( 



where a summation over three quark fiavors / is included. 
The usual fermionic occupation numbers for the quarks 
are denoted by 



1 



l + expiiEqj^fi)/T) 



(3) 



and for antiquarks by nqj{T,fj,) 



,j{T,—^) respec- 



tively. In this example only two different single-particle 
energies, Eq,i, i — q,s, emerge 



'q,q = ^k^ + {gaq/2) 



2 and E„ 



(4) 

The first index i = q denotes the combination of two mass- 
degenerate light-quark fiavors (u, d) and the other index s 
labels the heavier strange quark fiavor. The expressions 
in parentheses in Eq^i are the corresponding quark masses. 
In this way, the dependency of the grand potential on the 
condensates, cr^, i = q,s enter through the quark masses, 
which has not been indicated explicitly in Eq. 

The mesonic potential does not depend on the quark 
chemical potential nor on the temperature explicitly. It is 
just a function of the two condensates and reads 



Ai 



+ '^^Vs + 8 (2Ai + A2) + - (2Ai + 2A2) , (5) 

wherein all remaining quantities, e.g. m, hq, . . . are con- 
stant parameters. 

Since the physical condensates, a, and CTs, are deter- 
mined by the extrema (minima) of the total grand poten- 
tial with respect to the corresponding fields, they fulfill the 
equations of motion 



dn{T,ii;aq,as)) 



do 







— 

(Ts — O's 



q,s 



(6) 



This in turn introduces an implicit T- and /x-dependence 
of both condensates. 



q,s 



(7) 



These quantities represent the physical order parame- 
ters which, together with the grand potential, are the basis 
of the exploration of the phase structure of the model. We 
denote the grand potential evaluated at ai = ai, i = q,s, 



n{T,^i) = n{T,n;aq{T,n),as{T,n)) 



(8) 



In order to find the temperature and chemical potential 
behavior of the order parameters the integral in Eq. ^ 
and simultaneously the EoM have to be solved numerically. 
This already is an example suitable for an AD application, 
because a derivative of a only numerically solvable, implicit 
function is needed as input. Later we will be interested in 
higher-order derivatives of the grand potential with respect 
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to, e.g., the chemical potential. For example, the quark 
number density at the physical point is defined by 



(9) 



In cases without an implicit T- or /i-dependence in the 
thermodynamic potential some progress can be made by 
calculating the corresponding derivatives explicitly and solv- 
ing the corresponding equations numerically. This might 
be feasible for lower-order derivatives, in particular, if parts 
of the derivative calculations can be performed by some 
computer algebra packages like Mathematica or Maple. 
But for higher-order derivatives this procedure is error- 
prone and time-consuming and not applicable anymore. 

In the following sections the algorithmic differentiation 
technique for implicitly defined functions is introduced on 
a general mathematical level. 

3. Algorithmic Differentiation 

Suppose the function F : M" M™, y = r(x) describ- 
ing an arbitrary algebraic mapping from K" to is de- 
fined by an evaluation procedure in a high-level computer 
language like Fortran or C. The technique of algorithmic 
differentiation provides derivative information of arbitrary 
order for the code segment in the computer that evalu- 
ates F(x) within working accuracy. For this purpose, the 
basic differentiation rules such as, e.g., the product rule 
are applied to each statement of the given code segment. 
This local derivative information is then combined by the 
chain rule to calculate the overall derivatives. Hence the 
code is decomposed into a long sequence of simple evalua- 
tions, e.g., additions, multiplications, and calls to elemen- 
tary functions such as sin(a;) or exp(a;), the derivatives of 
which can be easily calculated. Exploiting the chain rule 
yields the derivatives of the whole sequence of statements 
with respect to the input variables. 

As an example, consider the function F : 
with 

yi = sin(a;i * X2) 

y2 = X3 + cos(xi * X2) 

that can be evaluated by the pseudo-code given on the left 
column of Tab. [TJ On the right-hand side, the resulting 
statements for the derivative calculation y = F'(x)x are 
given. 



For the vector x — (1,0,0)"^ one obtains the first col- 
umn of the Jacobian VF(x) the vector y = (cos(xi * X2) * 
X2, — sin(a;i * X2) * 2^2)"^- Correspondingly, the other unit 
vectors in R"^ yield the other two remaining columns of the 
Jacobian F'(x). 

Table [1] illustrates the so-called forward mode of AD, 
where the derivatives are propagated together with the 
function evaluation. Alternatively, one may propagate the 
derivative information from the dependents y to the inde- 
pendents X yielding the so-called reverse mode of AD. 

Over the past decades, extensive research activities led 
to a thorough understanding and analysis of these two ba- 
sic modes of AD, where the complexity results with respect 
to the required runtime are based on the operation count 
Of, i-G., the number of floating point operations required 
to evaluate F(x), and the degree d of the computed deriva- 
tives. Using the forward mode, one computes the required 
derivatives together with the function evaluation in one 
sweep as illustrated above. The forward mode yields one 
column of the Jacobian VF at no more than three times Of 
One row of VF, e.g., the gradient of a scalar- valued 
component function of F, is obtained using the reverse 
mode in its basic form also at no more than four times Of 
[T] . It is important to note that this bound for the reverse 
mode is completely independent of the number n of input 
variables. This observation is called cheap gradient result. 

For the application discussed in the present work, the 
forward mode has been chosen for the efficient compu- 
tation of higher-order derivatives which is illustrated in 
the following paragraphs. To this end, we consider Taylor 
polynomials of the form 



where 



(10) 
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7! mi-^'^ 



t=0 



are scaled derivatives at t = 0. The expansion is trun- 
cated at the highest derivative degree d which is chosen by 
the user. The vector polynomial x(i) describes a path in 
R" which is parameterized by t. Thus, the first two vec- 
tors Xi and X2 represent the tangent and the curvature 
at the base point xq = x(0). Assuming that the function 
y = F(x) is sufficiently smooth, i.e., d times continuously 
differentiable, one obtains a corresponding value path 



Vl 


= Xi* X2 


Vl 


= ii * a;2 -1- * ±2 


V2 


= sin(ui) 


V2 


= cos(wi) * Vl 


V3 


— cos(wi) 


V3 


= — sin(wi) * Vl 


Va 


= X3 + V3 


Vi 


= is + "3 


Vl 


= V2 


Vl 


= V2 


y2 


= Vi 


m 


= V4 



Table 1: Function and derivative calculation. 
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J=0 



F{x{t)) + 0{t''+') e 



(11) 



The coefficient functions are uniquely and smoothly de- 
termined by the coefficient vectors X; with i < j. To com- 
pute this higher-order information, first we will examine 
for a given Taylor polynomial 



x(t) = xo + xi i + X2 t" 



3 



v{t) = 


Formula (1 < fc < d) 


OPS 


MOVES 


x{t)+y{t) 


Vk=xk+ yk 


- 2d 


3d 


x{t) * y{t) 


k 

J2 Xj * yk-j 
j=o 


^d^ 


3d 


Table 2: Taylo 


• coefficient propagation for 
Formula {1 < k < d) 


arithmet 

OPS 


ic operations 

MOVES 


exp(a;(t)) 


k 

kVk = J2 j^k-jXj 


^d^ 


2d 



Table 3: Taylor coefficient propagation for exponential 

the derivative computation based on "symbolic" differen- 
tiation. 

Let us generalize the previous relation y{t) = F(x(i)) 
given in Eq. (fTTjl , and consider now a general smooth func- 
tion v(i) ~ (p{x{t)) as for example the evaluation of a 
sin(.)-function. This function v(t) represents one of the 
intermediate values computed during the function evalua- 
tion as illustrated in Tabled] One obtains for the Taylor 
coefficients 



1 , , 



and tpj.) = ^cp^^\.) 



t=a J- 
the derivative expressions 

vo = ifi^o) 
Vl = v5i(xo)xi 

V2 = ¥'2(xo) Xi Xi + (/JlCxo) X2 

V3 = V'3(xo)xi Xi Xi -h2y>2(xo)xi X2 +(pi{xo)x3 
V4 = (p4(xo) Xi Xi Xi Xi + 3(p3(xo) Xi Xi X2 
+ ¥'2(xo) (X2X2 +2x1X3) + v?i(xo)x4 



Hence, the overall complexity grows rapidly with the de- 
gree d of the Taylor polynomial. To avoid these pro- 
hibitively expensive calculations the standard higher-order 
forward sweep of algorithmic differentiation is based on 
Taylor arithmetic [8] yielding an effort that grows like 
times the cost of evaluating y(x). This is quite obvious 
for arithmetic operations such as multiplications or addi- 
tions, where one obtains the recursion shown in Table [21 
where OPS denotes the total number of floating point op- 
erations and MOVES the total number of memory accesses 
required to compute all Taylor coefRcients wq, . . . , f^. For 
a general elemental function ip, one finds also a recursion 
with quadratic complexity by interpreting ip as solution of 
a linear ordinary differential equation as described in [ij. 
Table 12] illustrates the resulting computation of the Taylor 
coefficients for the exponential function Similar formulas 
can be found for all intrinsic functions. This fact permits 
the computation of higher-order derivatives for the vector 
function F(x) as composition of elementary components. 



The AD-tool ADOL-C ^] uses the Taylor arithmetic as de- 
scribed above to provide an efficient calculation of higher- 
order derivatives. 



4. Higher-order Derivatives of Implicit Functions 

4-.1. Basic Algorithm 

For the application considered here, higher-order deriva- 
tives of a variable y G M™ are required, where y is implic- 
itly defined as a function of some variable x G R"^™ by 
an algebraic system of equations 



G(z) 



G 



with z = (y, x) G 



Naturally, the n arguments of G need not be parti- 
tioned in this regular fashion. To provide flexibility for a 
convenient selection of the p = n ~ m truly independent 
variables x, let P G M^^" be a projection matrix with 
only or 1 entries that picks out these independent vari- 
ables. Hence, P is a column permutation of the matrix 
[0,lp] G RP^". Then the nonlinear system 

G(z) =0, Pz = X, 

has a regular Jacobian, wherever the implicit function the- 
orem yields y as a function of x. Therefore, we may also 
write with H : R" R" 



H(z) 



G(z) 
Pz 





Pz 



S X, 



(12) 



for the seed matrix S = [0,L, 



Now, we have 



rewritten the original implicit functional relation between 
X and z as an inverse relation H(z) = Sx. Assuming an 
H : R" 1-^- R" that is locally invertible we can evaluate the 
required derivatives of the implicitly defined z G R" with 
respect to x G R^ using the computation of higher-order 
derivatives described above in the following way. 

Starting with a Taylor expansion Eq. (jlOp of x and a 
corresponding solution z(x(t)) of Eq. (H^), one obtains for 
a sufficiently smooth H the representation 

d 

Sx = H(z(x(<))) = J2 Hj^-'' + 0(i'^+i). 

3=0 

Substituting the Taylor expansion of x into the previous 
equation yields 

d d 
S ^ Xjt^ ^ lijt^ + 0{t'^+^). 

3=0 j=0 

From the comparison of coefficients, it follows that 



Sx,i^ = n.t^ ^ Sx, = H, = (^^H(z(x(i))) 



(13) 
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As a next step, the structure of the Taylor coefhcients 
is analyzed. For the first three coefficients, one has 



Ho = H(z(x(0))) = H(zo) 



Hi 



|H(z(x(t))) 



Hz(zo)zi 



t=o 



f / 92 
H2=2(a^H(z(x(t))) 



t=0 



\ (|H.(z(x(t)))|z(xW) 
Hz(zo)z2 + iHz2;(zo)ziZi 



due to the definition of Zj, where Hz;(.) denotes the deriva- 
tive of H(.) with respect to its argument. For the higher- 
order coefficients, it is now shown that they have the struc- 
ture 



H, = l(^H.(z(x(i)))^z(t) 



H,(z(x(i)))|t=o 



(14) 

where Hj(z(x(t))) involves only derivatives of order j — 1 
with respect to t and hence 

Hj(z(x(t)))|t=o = Hj(zo, . . . ,Zj_i). 
For H2, one obtains 



H2(z(x(t))) = \ ('|H,(z(x(t)))) (|z(x(t)) 



Now, let the assumption hold for j — 1. Then, one obtains 
for j the equation 



«^ = 7! (|jH(^(xW)) 

-H(z(x(i))) 



j! \dtdP 
ij - 1)! 



t=0 

|fH.(z(xW))|-^z(x(t)) 



H,-i(z(x(t))) 



t=0 



l(^H,(z(x(t)))^z(x(t)) 



^H.(z(xW)) 



t=o 
-z(x(t)) 



f /_a 



-(-H,_i(z(xW)) 



t=o 



Due to the assumptions, the function 

H,(z(x(t))) = i (^|H.(z(x(t)))) (^^z(x(0) 

+ ;^|h,-i(z(x(0)) 



involves only derivatives of order j — f with respect to t 
since Hj_i(z(x(t))) does only contain derivatives of order 
j — 2 with respect to t. Therefore, is proven and it 
follows that 



Hj = Hz(zo)zj + Hj(zo 



J V^Ui ■ ■ ■ 



(15) 



due to the definition of Zj. Combining ([15)1 with (jT3)l . one 
obtains the equations 

Sxj =Hz;(zo)zj +Hj(zo,...,Zj_i) l<j<d 

and therefore 

z, = (H,(zo))-i(Sx, - H,(zo, . . . ,zj_i)) 1 < J < d 

where the Jacobian Hz;(z(x(i))) and its factorization can 
be reused as long as the argument z(x(t)) is the same. For 
this purpose, the Jacobian Hz;(z(x(t))) can be evaluated 
exactly by using the forward mode of AD. 

Therefore, it remains to provide the missing contribu- 
tions Hj(zo, . . . , Zj_i) to compute the desired Taylor co- 
efficients Zj . One starts with the Taylor expansion 

zo = z(x(0)), zi = (H,(zo))-ixi, zj=0 l<j<d. 

For j 2, . . . , d, one performs the following steps 

1. A forward mode evaluation of degree j. Since Zj — 
this yields only the contribution Hj(zo, . . . , Zj-i). 

2. One system solve 

Zj = (Hz;(zo))"\Sxj - Hj(zo, . . . ,Zj_i)) 

to compute Zj. 

This approach provides the complete set of Taylor co- 
efficients of the Taylor polynomial z = z(x) that is defined 
by a given Taylor polynomial (|f Op for x. These Taylor co- 
efficients of z = z(x) are computed for a considerably small 
number of Taylor polynomials x(t) to construct the desired 
full derivative tensor for the implicitly defined function z 
according to the algorithm proposed in ■ 

4-2. A Simple Example 

Consider the following two nonlinear expressions 

Gi{zi, Z2, 23,24) = zl + zl- zl 

6*2(2:1, 22, 2:3, 2:4) = COS (24) - 21/23 

describing the relation between the Cartesian coordinates 
(21, 22) and the polar coordinates (23, 24) in the plane. As- 
sume, one is interested in the derivatives of the second 
Cartesian and the second polar coordinate with respect to 
the first Cartesian and the first polar coordinate. Then one 
has 71 = 4, TO = 2, p = 2, x = (21,23), and y = (22,24). 
The corresponding projection and seed matrix are 



P 



f 
f 



and S'^ ^ 



f 
f 



Provided the argument z is consistent in that its Carte- 
sian and polar components describe the same point in the 
plane, one has G(z) = 0. In this simple case, one can 
derive for the implicitly defined functions yi = 22(^1, 2:3) 
and 2/2 = ^4 (2:1, 23) the desired derivatives exphcitly by 
symbolic manipulation: 



yi 



and j/2 — arccos (zi/zs). 



The derivatives up to order 3 of yi will be used to verify 
the results from the differentiation of the implicitly defined 
functions. These derivatives have the following represen- 
tation: 



dyi 
dzi 

dyi 

dz3 

dzl 

d^yi 
dzidzz 

d'^yi 
dzi 

d^yi 
dzl 

dzfdz3 

d^yi 
dzidz^ 

d^yi 



'Zl 



Z3 



Z1Z3 

{zi-zf)l 
^2 



{z? 



(z? 



3-4)^ 
3zf 

3 ~ 4)' 
3z?z3 



V4 - 4 

3zi 



(16) 



(4 



+ 



Z3 



i4-zf)i ( 



Zl 



3ziz^ 



zf)i 

.2 
3 



(z? 



dzl 



3 ~ 4)' 

-3z3 

(4 - 4Y- 



{4 



3z| 



{4 



As shown in [1(J|, these derivatives can be computed 
efficiently and exactly from a considerable small number 
of univariate Taylor expansions like the one in Eq. (|lip . 
Furthermore, this article also proposes a specific choice of 
the employed Taylor polynomials. For the example con- 
sidered here, i.e., m — 2 and d = 3, one obtains the Taylor 
expansions 



zo = (4,3,5,0.6435)^ 



/ 



z = zo 



3\ 

-4 



-1 / 

-1 



t + 



25 
6 



2 
3 



V-l / 



v-iy 



t+ 



\ 

_ 2 

3 




<2 + 



\ 

50 

9 



19 
18 





2 
9 



46 



4b I 

1125 / 



= Zo 



f'\ ( 

2 
2 



t + 



Zo + 



5 



t + 



V I J 



6 



3 


68 
73 



9 


\ 2250 / 



/ \ 



\-%J 





40 
9 



1508 



\ 1508 I 
\ 1125 / 



From these numerical values, one can derive the desired 
derivatives in Eq. (|16p as given below 



dyi 
dzi 
dyi 

dZ3 

d^yi 
dzl 

d'^yi 
dzidzz 

dhi 

dzl 

d^yi 
dz\ 

d^yi 
dz\dz3 



1 



1 



^12 



(-4) 



■'12 



= - * 5 



25 _ 2 1 



27 9 



20 
27 



16 

'27 



36 

2 . 



9 ' 
100 2 



2 

— * 
9 



25 
1 



3 

50 



36 



^ ~Z32 — - * - 



95 
81 



9 

_5_ 
27 



^32 



9 
2 



=■32 



=■32 



27 



^32 




Then, the procedure described in the previous section yields 
the following Taylor expansions of z(x) with the base point 



d^yi 
dzidzl 

d^yi 



81 27 



^32 



80 
81 



^32 



3-32 

2 40 
9 * ¥ 



^32 



27 



^32 



where z^ ^ denotes the /th component of the kth Taylor 
coefficient of the Taylor expansion j . As can be seen, the 
required 24 entries of the first three derivative tensors can 
be obtained from four univariate Taylor expansions. This 
computation of tensor entries from the Taylor expansions, 
i.e. the exact coefficients for the Taylor coefficients, is 
derived and analyzed in detail in [loj . 



4-3. Remarks on Efficiency 

In the procedure described above, the higher-order forward 
mode of AD is apphed for each value of j for j = 2, . . . , d. 
Employing in addition to the higher-order forward mode 
the higher-order reverse mode, the number of forward and 
reverse sweeps can be reduced to \og2{d). In this case, the 
values of the required Hj(zo, . . . , Zj_i) is reconstructed 
from the information available due to the reverse mode 
differentiation. The AD-tool ADOL-C provides a corre- 
sponding efficient implementation of this algorithm and 
will be used in the numerical tests below, where the log2- 
behavior of the higher-order derivative calculation can be 
observed in the measured runtimes. 



5. Applications 

In the following the previous general mathematical de- 
scription of the AD technique is applied to the model ex- 
ample introduced in the beginning. Furthermore, the AD 
results are then compared to those obtained with the stan- 
dard divided differentiation (DD) method. 

5.1. Algorithmic Differentiation applied to the model 

The first step for the calculation of the fc-th order 
derivatives of the grand potential with respect to /i, 

£^niT,^,)^^n{T,^l,a,iT,^l),asiT,^,)) , (i?) 

by means of the AD technique requires a suitable formu- 
lation of Cl{T,fi). This can be accomplished by a Taylor 
expansion of the condensates aq^s(T, /j,). The required co- 
efficients, i.e., the derivatives 



is introduced. The n — 3 dimensional argument z = 
(y(a;),a;) splits into m = 2 imphcitly defined functions 



q,s 



(18) 



can be calculated by applying the technique described in 
the previous section for implicit functions. The next step 
consists in the calculation of the derivatives of ^{T, fi) 
w.r.t. fj, by using the Taylor expansions of the conden- 
sates. In the following the procedure will be exemplified 
in detail. 

In this example only one, i.e. p — 1, truly independent 
variable a; = ^ is considered and the temperature T plays 
the role of a constant parameter. The generalization to 
mixed derivatives with respect to T and /i can also be 
realized but is omitted for simplicity. 

Firstly, the Taylor coefficients for the condensates at 
are needed. This is done via the inverse Taylor expansion 
capabilities of ADOL-C. For that purpose the following 
function 



G(z) 



/ dQ{T,fi-crg.,cy,) 



da. 



Ts — a s 



y = (cr,,crs) andp: 



1 truly independent variable 



(19) 



X — fj,, cf. Eq. p2p . Furthermore, the projection matrix 
reads P= (0,0,1) € R^^". 

In order to obtain the /i-derivatives of the functions ai 
for fixed values of (T, fj.) = {Tq, /xq) the following steps are 
required: 

1. The numerical solution of the EoM, see Eq. yields 
the values of the condensates (Ti(To,/io) at the po- 
tential minimum. For these values the condition 
G(zo) = with zo{d-q{To,fj,o),as{To,fio),To,fio) is 
obviously valid. 

2. Prepare the derivative calculation for H(zo) using 
ADOL-C. 

3. Evaluate the Taylor coefficients of ai{T, jj) at (Tq, /xq) 
up to the highest derivative degree d desired by the 
user. 

From now on, the Taylor expansions of ai{T,fi) around 
(To,/io) are labeled as af^'''^'°''^\T, fi). These Taylor ex- 
pansions are inserted in the grand potential which leads 
to the definition 

(20) 

The function D, is exact in the explicit /i-dependence 
but only exact up to order d in the implicit dependence. 
Thus, the fc-order derivatives of correspond to the deriva- 
tives of the original if the derivatives are evaluated at 
the expansion point (Tq, /iq) and k < d, i.e., we have 

^h^'^'^''°^''\T^,^,^) = ^n{T^,^i^) iork<d. (21) 

This equation is valid only at the point (To,/xo). In order 
to obtain the desired i7 derivatives at another (T, jj) point, 
the expansion coefficients of dq^s have to be recalculated 
for each (T,/!) point. 

However, this reduces the problem of calculating the 
derivatives of H with only implicitly known functions ai to 
the calculation of the Q derivatives with explicitly known 

Finally, the calculation of the derivatives of (l then 
requires two steps: 

1. Prepare the derivative calculation for fl with ADOL- 
C at (To, fio) which were chosen for the evaluation of 
the coefficients of (t^. 

2. Evaluate the Taylor coefficients of (l. 

5.2. Divided Differences 

Another method to approximate derivatives of a func- 
tion is based on divided differences which is explained in 
the following. 
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Based on the definition of tlie derivative of a function 
/ at a point x 

h^O h 

the simplest linear approximation for f'{x) is obtained by 
calculating the right-hand side of Eq. ((22)) for a small but 
finite value of h 

« fi- + h)^-m . (23) 

The problem with this approximation is that it involves 
two types of errors (cf. e.g. [Hj). If h is too large, the 
so-called truncation error induced by the used approxi- 
mation or algorithm to calculate the derivative becomes 
significant. On the other side, when h becomes too small 
another error, the rounding error yields cancellations in 
the enumerator of and spoils the quality of the ap- 
proximation. 

Since the two error sources compete with each other, 
one has to find an optimal value of h for which the nu- 
merical error of the derivative evaluation is smallest. In 
general, this optimal h varies with x, the point at which 
the derivative is calculated. 

The truncation error is relatively easy to control. By 
comparing Eq. (j23p with a Taylor expansion for f{x + h) 
around x one sees that the truncation error of the lin- 
ear approximation is of 0{h), i.e., the error is a linear 
function of h. Thus, decreasing h will also decrease the 
truncation error. In addition, by increasing the degree of 
the expansion the truncation error can also be further im- 
proved. One such improved extrapolation, the Richardson 
expansion, is based on the n-th order Taylor expansion for 
f{x + h) and f{x — h) around x for which a truncation 
error of the order 0{h^") can be derived. By repeating 
the algorithm for the determination of the truncation er- 
ror, a better approximation for the first derivative f'{x) 
can be obtained. Similar improvements of the truncation 
error for higher-order derivatives are also known. 

As an example the corresponding approximations for 
the second derivative f"{x) with three grid points Xi 

/"(^) = ^ [/(^i) - 2/(a;2) + /(X3)] + 0{h') (24) 
and with five grid points 

/"(^) - h/K) + 16/(xi) 

-30f{x2) + 16/(x3) - /(X4)] + 0{h^) (25) 
are itemized. The grid points are given by 

xq — X — 2h, xi — X — h, xi= X 
Xj, = x + h, X4 = X + 2h . 

For completeness the fourth-order derivative is quoted 

r'ix)^^[f{xo)-ifix^) + 6f{x2) 

-4/(a;3) + fix^)] + 0{h^) . (26) 



where at least five grid points are needed for its calculation. 

The disadvantage of such type of improvements is that 
the function has to be evaluated at several different grid 
points Xi which are located in the vicinity of x. 

The other error source, the rounding error, depends 
on the used format of the floating point number repre- 
sentation in the computer. A single precision IEEE float- 
ing point number is stored in a 32-bit word, where 8 bits 
are used for the biased exponent and the fractional part 
of the normalized mantissa is a 23-bits binary number. 
One bit in the IEEE format is always reserved for the sign 
of the number. A double precision number occupies 64 
bits, with the biased exponent stored in 11 bits and the 
fractional part is stored on the remaining 52 bits. Thus, 
besides the fact that one can represent only a finite sub- 
set of all real numbers, all floating point calculations are 
furthermore rounded resulting in incorrect values. The 
smallest positive number e, where the floating point ap- 
proximation for H- e is indeed larger than one is called the 
machine precision. When one rounds to the nearest repre- 
sentable number the machine precision is roughly e 2~"^ 
where m is the number of bits used to store the mantissa's 
fraction. For a single precision representation one finds 
e~2^^'^^10^^ and for a double precision number calcu- 
lation e ~ 2^^^ ^ 10^^^. This means that single precision 
numbers have at most about 7 accurate digits while dou- 
ble precision numbers have about 16 accurate digits. But 
in general, due to the error propagation during the appli- 
cation of approximate algorithms the number of accurate 
digits for a numerical solution decreases. Therefore, the 
rounding error will be several orders of magnitude larger 
for a more complicated calculation such as the one for 
the thermodynamic potential. To minimize this source of 
error in the derivative calculation of the thermodynamic 
potential, a larger value of h is reasonable. 

In order to estimate these numerical errors and ver- 
ify the quality of the ADOL-C evaluations the results of 
the derivative calculation obtained with AD are confronted 
with the DD method. 

In Figs.[T]and[2]the results of a DD evaluation as a func- 
tion of h in comparison with the AD calculation for the 
second-order and fourth-order derivative of the thermody- 
namic potential are shown. Fig. [1] shows the /^-derivatives 
of the potential evaluated at (T, fi) = (183, 0) MeV which 
is close to the crossover phase transition in the (T, /i) phase 
diagram. One can clearly see the competition of the trun- 
cation and rounding errors. For the second-order deriva- 
tive the optimal value is around h ^ 0.05 MeV while for the 
fourth-order derivative a slightly larger value /i ~ 0.1 MeV 
leads to more stable results. In Fig.[2]the same derivatives 
are calculated at the point (T, fi) = (63, 327) MeV which 
is near the critical end point in the phase diagram. While 
in the previous Fig. [1] the rounding error dominates, the 
truncation error is now more important. For the second- 
order derivative almost no rounding error is visible in the 
resolution shown. Since the truncation error for the five- 
point expression, Eq. (PS)) . is of the order O (/i'') and of 



8 



> 

(U 



o 

00 

in 
o 

00 



a 

T3 



1.00002 p 
1.00001 
1 

0.99999 
0.99998 
0.99997 

I 

0.99996 - 



five-point DD 
three-point DD 
AD 



0.001 0.01 

h [MeV] 

(a) Second-order derivative 



o 



a 

■D 



0.1 



1.4 
1.2 
1 

0.8 
0.6 



five-point DD 
AD 



• • • 



0.001 0.01 

fi [IVIeV] 

(b) Fourth-order derivative 



0.1 



Figure 1: Comparison of the second- (left) and fourth-order (right) derivatives of the grand potential Q, evaluated at (T, n) = (183, 0) MeV, 
with respect to fi using three points and five points DD with the AD technique. AH DD results arc normalized to the AD results. 
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Figure 2: Similar to Fig. [T] but Q evaluated at another point in the phase diagram, (T, fi) = (63,327) MeV. 



the order O {h^) for the corresponding three-point equa- 
tion, Eq. ([M)) . the results of the five-point derivative is 
indistinguishable already for h ^ 0.1 MeV while for the 
three-point formula a smaller value of /i ~ 0.02 MeV is re- 
quired. For the fourth-order derivative the interval where 
the derivative does not vary with h is very small. Only for 
h^QM MeV the DD resuh is close to the AD result. 

In summary, one realizes that the DD derivatives re- 
quire a very careful fine-tuning of the h value. The DD 
result coincides always with the DD results where the h 
variation vanishes. One finds that the AD technique is 
more efficient than the DD method. The DD calculation 
always requires the evaluation of the function at several 
points, e.g., for the fourth derivative five function evalua- 
tions are necessary. In our case this involves the solution 
of the EoM at these five nodes. This is a time-consuming 
disadvantage of the DD method. With the AD the EoM 
need to be solved only once. Despite the fact that it is 
required to generate an internal function representation of 
the evaluation of the EoM solution and of the thermody- 
namic potential inside of ADOL-C, the AD implementa- 
tion is much faster. Corresponding runtime measurements 
are illustrated in Fig. [31 
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Figure 3: Runtime comparison for the DD and AD approaches. 



The runtime of the DD approach can be described by 
the linear function f{d) = rn * d + a where as the AD 
runtime performs like g{d) = c * \og2{d) + b. This result 
fits perfectly to the computational complexity of the AD 
approach described in Sec. [ 
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6. Taylor coefficients 



tives of the pressure 



As previously illustrated, both error sources for a deriva- 
tive calculation with the DD method are in general difficult 
to keep under control, in particular, if higher-order deriva- 
tives are involved. However, with the AD method it is 
possible to obtain higher-order derivatives with very high 
precision. In the following an explicit example is given 
within the already introduced linear sigma model. 

Higher derivatives are required if one is interested, e.g., 
in the extrapolation of Monte Carlo lattice simulations of 
strongly interacting matter (lattice gauge theory) to finite 
quark chemical potential. At finite quark chemical poten- 
tial such types of Monte Carlo simulations cannot be di- 
rectly performed . One possible extrapolation to finite 
quark chemical potential is based on a Taylor expansion 
around zero chemical potential l^, 

For this purpose, we consider the same kind of expan- 
sion in the quark-meson system described by the LcrM. 
An example is given by the coefficients in the expansion of 
the pressure p which is related to the thermodynamic po- 
tential via p(T, /i) — —U, (T, /x) . At fixed temperature and 
small values of the quark chemical potential the pressure 
may be expanded in a Taylor series around /i 0, 



rp4 



oo 

E^»(^)(f)"' (27) 



n=0 



c„(T) 



1 9»(p(r,/i)/T4) 



n! d{^i/TY' 



(28) 



where the expansion coefficients are given in terms of deriva- 



The series is even in (/i/T) which reflects the invariance of 
the partition function under the exchange of particles and 
antiparticlcs. 

In Fig. m the expansion coefficients C6(T) to C22{T) are 
shown as function of the scaled temperature T/Tq. Here, 
To is the pseudocritical temperature at which the crossover 
transition occurs for vanishing chemical potential. Since 
the first three expansion coefficients co,C2 and C4 are al- 
ready known and well-understood we do not show them 
again [l3|. In lattice gauge theory one can currently cal- 
culate the first five coefficients, cq, . . . , cg [lH. 

The higher coefficients c„ with n > 4 vanish for tem- 
peratures basically outside of a five percent window around 
Tq. Thus, all coefficients are only shown in the range 
0.9 < T/Tq < 1.1. All curves are smooth oscillating func- 
tions around zero even up to the 22"'^ derivative order. 
The amplitude of the oscillation and the number of roots 
around Tq increases with the order n. Thus, this oscillat- 
ing behavior of the coefficients obviously requires a smaller 
h in order to decrease the truncation error but then the 
rounding error increases. Already in this example the error 
sources are dramatic for such a high degree of derivatives. 
Therefore it is not reasonable and actually not possible 
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to obtain the higher eoefhcients with standard techniques 
such as the DD method. 

7. Summary 

A novel numerical technique, which is based on al- 
gorithmic differentiation, for the calculation of arbitrar- 
ily high-order and high-precision derivatives has been pre- 
sented. The new feature of the technique is the additional 
treatment of implicitly defined functions. In addition, the 
basic concepts of the algorithmic differentiation for explicit 
dependencies is discussed. 

As a demonstration of the successful extension to im- 
plicitly defined functions the AD technique is applied to 
a quantum-field theoretical model for strongly-interacting 
matter. In this model the implicitly defined functions are 
represented by the underlying equations of motion where 
the implicitly defined order parameter is known only nu- 
merically. 

Two important error sources namely, the rounding and 
truncation error, for a derivative calculation in general are 
discussed in detail. Furthermore, the results with the im- 
proved AD method are confronted to those obtained by 
standard divided difference (DD) methods. In the com- 
parison the rounding and truncation errors can clearly be 
identified. While for a second-order derivative calculation 
the error sources are still controllable, they become in- 
tractable for higher orders. 

In the model example higher-order derivative coeffi- 
cients of a Taylor expansion for the pressure are calcu- 
lated up to 22"^^ order. Since these coefficients are calcu- 
lated for the first time, no comparison with other results 
can be performed. The obtained curves are very stable 
and smooth functions which demonstrates the power of 
the novel AD technique. In a forthcoming publication [l^ 
this method will be applied to the more realistic Polyakov- 
Quark-Meson model for three quark flavors [l^ . 

The presented AD technique augmented by implicitly 
defined dependencies can be applied to a wide class of 
problems, where high-order derivatives are involved. Stan- 
dard alternative methods for the derivative calculation such 
as the DD method fail due to uncontrollably increasing 
errors. Especially, in the case of only numerically known 
implicit dependencies, an analytic solution is actually not 
possible. Here, the AD method is still applicable and dis- 
plays its exceptional impact. 
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